Search results for "Graphics processing unit"

showing 10 items of 42 documents

Nvidia CUDA parallel processing of large FDTD meshes in a desktop computer

2020

The Finite Difference in Time Domain numerical (FDTD) method is a well know and mature technique in computational electrodynamics. Usually FDTD is used in the analysis of electromagnetic structures, and antennas. However still there is a high computational burden, which is a limitation for use in combination with optimization algorithms. The parallelization of FDTD to calculate in GPU is possible using Matlab and CUDA tools. For instance, the simulation of a planar array, with a three dimensional FDTD mesh 790x276x588, for 6200 time steps, takes one day -elapsed time- using the CPU of an Intel Core i3 at 2.4GHz in a personal computer, 8Gb RAM. This time is reduced 120 times when the calcula…

020203 distributed computingComputer scienceFinite-difference time-domain methodGraphics processing unit02 engineering and technologyComputational scienceCUDAPersonal computer0202 electrical engineering electronic engineering information engineeringComputational electromagnetics020201 artificial intelligence & image processingCentral processing unitTime domainMATLABcomputercomputer.programming_languageProceedings of the 10th Euro-American Conference on Telematics and Information Systems

researchProduct

Parallel Pairwise Epistasis Detection on Heterogeneous Computing Architectures

2016

This is a post-peer-review, pre-copyedit version of an article published in IEEE Transactions on Parallel and Distributed Systems. The final authenticated version is available online at: http://dx.doi.org/10.1109/TPDS.2015.2460247. [Abstract] Development of new methods to detect pairwise epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important task in bioinformatics as they can help to explain genetic influences on diseases. As these studies are time consuming operations, some tools exploit the characteristics of different hardware accelerators (such as GPUs and Xeon Phi coprocessors) to reduce the runtime. Nevertheless, all these approaches are not able t…

0301 basic medicineCoprocessorComputer science0206 medical engineeringAccelerationData modelsSymmetric multiprocessor systemComputational modeling02 engineering and technologyParallel computingSupercomputer03 medical and health sciencesTask (computing)030104 developmental biologyCoprocessorsComputational Theory and MathematicsHardware and ArchitectureSignal ProcessingGeneticsPairwise comparisonComputer architectureGraphics processing units020602 bioinformaticsXeon Phi

researchProduct

GSaaS: A Service to Cloudify and Schedule GPUs

2018

Cloud technology is an attractive infrastructure solution that provides customers with an almost unlimited on-demand computational capacity using a pay-per-use approach, and allows data centers to increase their energy and economic savings by adopting a virtualized resource sharing model. However, resources such as graphics processing units (GPUs), have not been fully adapted to this model. Although, general-purpose computing on graphics processing units (GPGPU) is becoming more and more popular, cloud providers lack of flexibility to manage accelerators, because of the extended use of peripheral component interconnect (PCI) passthrough techniques to attach GPUs to virtual machines (VMs). F…

0301 basic medicineScheduleGeneral Computer ScienceComputer scienceDistributed computingnetworkingCloud computing02 engineering and technologycomputer.software_genre03 medical and health sciencesGPU resource management020204 information systems0202 electrical engineering electronic engineering information engineeringCloud computingGeneral Materials ScienceResource managementplatform virtualizationbusiness.industrycloud computingGeneral EngineeringVirtualizationShared resource030104 developmental biologyVirtual machineScalabilityGPU cloudificationlcsh:Electrical engineering. Electronics. Nuclear engineeringGeneral-purpose computing on graphics processing unitsbusinesscomputerlcsh:TK1-9971IEEE Access

researchProduct

GPU-Based Optimisation of 3D Sensor Placement Considering Redundancy, Range and Field of View

2020

This paper presents a novel and efficient solution for the 3D sensor placement problem based on GPU programming and massive parallelisation. Compared to prior art using gradient-search and mixed-integer based approaches, the method presented in this paper returns optimal or good results in a fraction of the time compared to previous approaches. The presented method allows for redundancy, i.e. requiring selected sub-volumes to be covered by at least n sensors. The presented results are for 3D sensors which have a visible volume represented by cones, but the method can easily be extended to work with sensors having other range and field of view shapes, such as 2D cameras and lidars.

0303 health sciences030306 microbiologyComputer scienceVolume (computing)020207 software engineeringField of view02 engineering and technology3d sensor03 medical and health sciencesRange (mathematics)CUDAComputer engineering0202 electrical engineering electronic engineering information engineeringRedundancy (engineering)Fraction (mathematics)General-purpose computing on graphics processing units2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA)

researchProduct

CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing

2014

We present CUSHAW2-GPU to accelerate the CUSHAW2 algorithm using compute unified device architecture (CUDA)-enabled GPUs. Two critical GPU computing techniques, namely intertask hybrid CPU-GPU parallelism and tile-based Smith-Waterman map backtracking using CUDA, are investigated to facilitate fast alignments. By aligning both simulated and real reads to the human genome, our aligner yields comparable or better performance compared to BWA-SW, Bowtie2, and GEM. Furthermore, CUSHAW2-GPU with a Tesla K20c GPU achieves significant speedups over the multithreaded CUSHAW2, BWA-SW, Bowtie2, and GEM on the 12 cores of a high-end CPU for both single-end and paired-end alignment.

BacktrackingComputer scienceParallel computingSoftware_PROGRAMMINGTECHNIQUESShort readComputational scienceCUDAParallel processing (DSP implementation)Hardware and ArchitectureParallelism (grammar)Electrical and Electronic EngineeringGeneral-purpose computing on graphics processing unitsSoftwareComputingMethodologies_COMPUTERGRAPHICSIEEE Design & Test

researchProduct

Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems

2012

SUMMARY The block Wiedemann (BW) algorithm is frequently used to solve sparse linear systems over GF(2). Iterative sparse matrix–vector multiplication is the most time-consuming operation. The necessity to accelerate this step is motivated by the application of BW to very large matrices used in the linear algebra step of the number field sieve (NFS) for integer factorization. In this paper, we derive an efficient CUDA implementation of this operation by using a newly designed hybrid sparse matrix format. This leads to speedups between 4 and 8 on a single graphics processing unit (GPU) for a number of tested NFS matrices compared with an optimized multicore implementation. We further present…

Block Wiedemann algorithmComputer Networks and CommunicationsComputer scienceGraphics processing unitSparse matrix-vector multiplicationGPU clusterParallel computingGF(2)Computer Science ApplicationsTheoretical Computer ScienceGeneral number field sieveMatrix (mathematics)Computational Theory and MathematicsFactorizationLinear algebraMultiplicationComputer Science::Operating SystemsSoftwareInteger factorizationSparse matrixConcurrency and Computation: Practice and Experience

researchProduct

Parallelizing Epistasis Detection in GWAS on FPGA and GPU-Accelerated Computing Systems

2015

This is a post-peer-review, pre-copyedit version of an article published in IEEE - ACM Transactions on Computational Biology and Bioinformatics. The final authenticated version is available online at: http://dx.doi.org/10.1109/TCBB.2015.2389958 [Abstract] High-throughput genotyping technologies (such as SNP-arrays) allow the rapid collection of up to a few million genetic markers of an individual. Detecting epistasis (based on 2-SNP interactions) in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. Computational methods to detect epistasis therefore suffer from prohibitively lon…

Computer scienceBioinformaticsDNA Mutational AnalysisGenome-wide association studyParallel computingPolymorphism Single NucleotideSensitivity and SpecificityComputational biologyComputer GraphicsGeneticsComputer architectureField-programmable gate arrayRandom access memoryApplied MathematicsChromosome MappingHigh-Throughput Nucleotide SequencingReproducibility of ResultsField programmable gate arraysEpistasis GeneticSignal Processing Computer-AssistedEquipment DesignRandom access memoryComputing systemsReconfigurable computingEquipment Failure AnalysisTask (computing)EpistasisHost (network)Graphics processing unitsGenome-Wide Association StudyBiotechnology

researchProduct

Towards an Efficient Implementation of an Accurate SPH Method

2020

A modified version of the Smoothed Particle Hydrodynamics (SPH) method is considered in order to overcome the loss of accuracy of the standard formulation. The summation of Gaussian kernel functions is employed, using the Improved Fast Gauss Transform (IFGT) to reduce the computational cost, while tuning the desired accuracy in the SPH method. This technique, coupled with an algorithmic design for exploiting the performance of Graphics Processing Units (GPUs), makes the method promising, as shown by numerical experiments.

Computer scienceGauss transformOrder (ring theory)Smoothed Particle Hydrodynamics Improved Fast Gauss Transform Graphics Processing UnitsSmoothed-particle hydrodynamicsSmoothed Particle Hydrodynamicssymbols.namesakeImproved Fast Gauss TransformGaussian functionsymbolsAlgorithm designGraphics Processing UnitsGraphicsAlgorithmComputingMethodologies_COMPUTERGRAPHICS

researchProduct

Deep Learning-Based Methods for Prostate Segmentation in Magnetic Resonance Imaging

2021

Magnetic Resonance Imaging-based prostate segmentation is an essential task for adaptive radiotherapy and for radiomics studies whose purpose is to identify associations between imaging features and patient outcomes. Because manual delineation is a time-consuming task, we present three deep-learning (DL) approaches, namely UNet, efficient neural network (ENet), and efficient residual factorized convNet (ERFNet), whose aim is to tackle the fully-automated, real-time, and 3D delineation process of the prostate gland on T2-weighted MRI. While UNet is used in many biomedical image delineation applications, ENet and ERFNet are mainly applied in self-driving cars to compensate for limited hardwar…

Computer scienceGraphics processing unit02 engineering and technologyResiduallcsh:TechnologyArticle030218 nuclear medicine & medical imaginglcsh:Chemistrydeep learning; segmentation; prostate; MRI; ENet; UNet; ERFNet; radiomicsSet (abstract data type)03 medical and health sciences0302 clinical medicineENetERFNet0202 electrical engineering electronic engineering information engineeringGeneral Materials ScienceSegmentationlcsh:QH301-705.5InstrumentationSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniFluid Flow and Transfer ProcessesprostateArtificial neural networklcsh:Tbusiness.industryProcess Chemistry and TechnologyDeep learningsegmentationGeneral EngineeringProcess (computing)deep learningUNetPattern recognitionlcsh:QC1-999Computer Science Applicationslcsh:Biology (General)lcsh:QD1-999lcsh:TA1-2040radiomics020201 artificial intelligence & image processingArtificial intelligenceCentral processing unitlcsh:Engineering (General). Civil engineering (General)businesslcsh:PhysicsMRIApplied Sciences

researchProduct

Total Variation Regularization in Digital Breast Tomosynthesis

2013

We developed an iterative algebraic algorithm for the reconstruction of 3D volumes from limited-angle breast projection images. Algebraic reconstruction is accelerated using the graphics processing unit. We varied a total variation (TV)-norm parameter in order to verify the influence of TV regularization on the representation of small structures in the reconstructions. The Barzilai-Borwein algorithm is used to solve the inverse reconstruction problem. The quality of our reconstructions was evaluated with the Quart Mam/Digi Phantom, which features so-called Landolt ring structures to verify perceptibility limits. The evaluation of the reconstructions was done with an automatic LR detection a…

Computer scienceGraphics processing unitInverseDigital Breast TomosynthesisTotal variation denoisingSolverAlgebraic numberAlgorithmRegularization (mathematics)Imaging phantom

researchProduct